-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update worldstate state change behavior in case of FCU #5699
Conversation
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
|
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
I conducted a test on the mainnet using this PR and version 23.4.4. During the experiment, I halted Prysm on both instances for a duration of 2 hours. Upon observation, the node operating with this PR successfully recovered, whereas the node running version 23.4.4 encountered a continuous decline in performance due to the lack of world state updates. Consequently, the node eventually crashed I also tested different Consensus clients with different scenarios
Also, setting a finalized block to the last finalized block on newPayload calls is a bit confusing.
|
Signed-off-by: ahamlat <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H <ameziane.hamlat@consensys.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but we should call out explicitly that we are using backward sync as an asynchronous way of moving head. We would probably benefit from having an explicit behavior for this whether it is part of backward sync service or not. As it is, when we encounter this condition, I believe we are going to unnecessarily download blocks we already have in blockchain storage.
if (maybeHeadHeader.isPresent()) { | ||
LOG.atDebug() | ||
if (maybeHeadHeader.isPresent() | ||
&& Math.abs(maybeHeadHeader.get().getNumber() - chainHead) < 500) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only necessary for bonsai, perhaps we can defer to the WorldStateArchive to provide this limit? eventually an archive-friendly version of bonsai would not need this also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we use backward sync mechanism as an asynchronous worldstate move, we should probably make backward sync aware of the fact that we already have those blocks. That is an optimization though and we can do that in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this modification and reduced the scope of this PR. normally it will be enough to fix the problem (testing now)
if (maybeHeadHeader.isPresent()) { | ||
LOG.atDebug() | ||
if (maybeHeadHeader.isPresent() | ||
&& Math.abs(maybeHeadHeader.get().getNumber() - chainHead) < 500) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace the number here for a named constant to describe where it comes from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Catch. Indeed I used this number for the tests but we must definitely use the Max Layer that we have for bonsai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 42 in ec6a6b1
public static final long RETAINED_LAYERS = 512; // at least 256 + typical rollbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that configurable though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there is a flag to change this value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we defer to a WorldStateArchive method that provides this limit, we can skip the behavior for forest since it should not have a worldstate move issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this part. I will just keep the worldtsate head update modification https://github.com/hyperledger/besu/pull/5699/files#diff-6da2d65dd7a0564527e8ce3ad7c4482f599fb2af30e37316c47854e5aed2e977R610 . I'm doing new test now with this last version
Signed-off-by: Ameziane H <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H <ameziane.hamlat@consensys.net>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
I conducted the same test as @ahamlat on the mainnet using this PR (after my last modification)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes much more sense than the prior behavior. I don't think we should have been doing a chain reorg without moving the worldstate.
🚢
} | ||
|
||
private boolean forwardWorldStateTo(final BlockHeader newHead) { | ||
private boolean moveWorldStateTo(final BlockHeader newHead) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rename implies that we could use this method to move the ws "backwards", is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we can do rollback and rollfoward with this code
) Update the worldstate in the same way as the blockchain in order to avoid having inconsistencies between the two and later trigger a big rollforward Signed-off-by: Karim TAAM <karim.t2am@gmail.com> Co-authored-by: Ameziane H <ameziane.hamlat@consensys.net> Signed-off-by: garyschulte <garyschulte@gmail.com>
PR description
We have identified a potential issue with the FCU mechanism. Based on the logs, it seems that Besu considers each new head as a reorganization (reorg) because of this line https://github.com/hyperledger/besu/blob/main/consensus/merge/src/main/java/org/hyperledger/besu/consensus/merge/blockcreation/MergeCoordinator.java#L610 so we are not updating the worldstate if the new head is not a direct descendant of the current head, which is rare in general. As a result, we enter the "rewindToBlock" method https://github.com/hyperledger/besu/blob/main/consensus/merge/src/main/java/org/hyperledger/besu/consensus/merge/blockcreation/MergeCoordinator.java#L633, which performs a chain reorg (in this case, a move forward of the chain) without changing the world state. However, with Prysm, we only encounter this case and keep advancing the chain without advancing the world state. Consequently, with each "engineNewPayload" call, we need to apply a larger number of trie logs to process the new block, which leads to a node crash.
Upon examining the code, it is indeed not normal to have a scenario where we change the blockchain without changing the world state.
Fixed Issue(s)